Exploiting Diversity for Natural Language Parsing

نویسنده

  • John C. Henderson
چکیده

The popularity of applying machine learning methods to computational linguistics problems has produced a large supply of trainable natural language processing systems. Most problems of interest have an array of off-the-shelf products or downloadable code implementing solutions using various techniques. Where these solutions are developed independently, it is observed that their errors tend to be independently distributed. This thesis is concerned with approaches for capitalizing on this situation in a sample problem domain, Penn Treebank-style parsing. The machine learning community provides techniques for combining outputs of classifiers, but parser output is more structured and interdependent than classifications. To address this discrepancy, two novel strategies for combining parsers are used: learning to control a switch between parsers and constructing a hybrid parse from multiple parsers' outputs. Off-the-shelf parsers are not developed with an intention to perform well in a collaborative ensemble. Two techniques are presented for producing an ensemble of parsers that collaborate. All of the ensemble members are created using the same underlying parser induction algorithm, and the method for producing complementary parsers is only loosely constrained by that chosen algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Annotation Projection-based Representation Learning for Cross-lingual Dependency Parsing

Cross-lingual dependency parsing aims to train a dependency parser for an annotation-scarce target language by exploiting annotated training data from an annotation-rich source language, which is of great importance in the field of natural language processing. In this paper, we propose to address cross-lingual dependency parsing by inducing latent crosslingual data representations via matrix co...

متن کامل

Eecient Full Parsing for Information Extraction

In this paper we propose an eecient deterministic approach to parsing texts for Information Extraction purposes that has been pursued by the natural language research group at ITC/IRST. The syntactic processing is split in two phases. During the rst phase predicate/argument structures are recognized, whereas modiier attachment is delayed. In the second phase we use the syntactic/semantic constr...

متن کامل

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cs.CL/0006012  شماره 

صفحات  -

تاریخ انتشار 2000